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Abstract 

We consider submodular optimization problems such as submodular function minimization! SFM) and 
the quadratic problems regularized by the Lovasz extension; for cut functions, this corresponds respectively 
to graph cuts and total variation (TV) denoising. Given a submodular function with an SFM oracle, we pro¬ 
pose a new active-set algorithm for total variation denoising, which is more flexible than existing ones; the 
algorithm may be seen as a local descent algorithm over ordered partitions with explicit convergence guar¬ 
antees. For functions that decompose into the sum of two functions Fi and F2 with efficient SFM oracles, 
we propose a new active-set algorithm for total variation denoising (and hence for SFM by thresholding 
the solution at zero). This algorithm also optimizes over ordered partitions and improves empirically over 
existing ones based on TV or SFM oracles for F\ and F2. 


1 Introduction 

Submodular optimization problems such as total variation denoising and submodular function minimization 
are convex optimization problems which are common in computer vision, signal processing and machine 
learning 12 , with notable applications to graph cut-based image segmentation 0 , sensor placement ED, or 
document summarization 1251 . 

In this paper, we consider a normalized submodular function F defined on V = {1,..., n} , i.e., F : 2 V —> R 
such that F(0) = 0 and an n-dimensional real vector it, i.e., u £ R". We aim at minimizing with respect to 

w G R": 

f(w)-u T w+l\\w\\l, (1) 

where / is the Lovasz extension of F. If F is a cut function in a weighted undirected graph, then / is its 
total variation, hence the denomination of total variation denoising problem for Eq. (jT]l, which we use in this 
paper—since it is equivalent to minimizing | ||it — iu||| + f(w). 

We also consider the general submodular function minimization (SFM) problem: 

min /( w) — u T w = min F(A) — 1 i(A), (2) 

iue[o,i]" acv 

where we use the convention u(A) = u t 1a, with 1,4 £ {0,1}” is the indicator vector of the set A. Our 
goal in this paper is to propose iterative algorithms to solve these two problems given certain oracles on the 
submodular function F. Note that our algorithms minimize general submodular functions as any submodular 
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function can be decomposed into a normalized submodular function, F, i.e., F(0) = 0 and a modular 
function, u( see HI). 

Relationship with existing work. Generic algorithms to optimize SFM in Eq. (|2]» or TV in Eq. ([T} problems 
which only access F through function values (e.g., subgradient descent or min-norm-point algorithm) are too 
slow without any assumptions J21, as for signal processing applications, high precision is typically required 
(and often the exact solution). 

For decomposable problems, i.e., when F = h\ + ■ ■ ■ + F r , where each Fj is “simple”, the algorithms use 
more powerful oracles than function evaluations improving the running times. Ii30ll used SFM oracles instead 
of function value oracles but they still remain slow in practice. However, when total variation oracles for each 
Fj are used, they become competitive ll20ll22lfl8l . Note that, in general, the exact total variation oracles are 
at most 0(n ) times expensive than their SFM oracles. However, there are a subclass of submodular functions 
(cut functions and other submodular functions that can be written in form of cuts) whose total variation 
oracles are only 0(1) times expensive than the corresponding SFM oracles but are still too expensive in 
practice.Therefore, the goal is to design fast optimization strategies using efficient SFM oracles for each 
function Fj rather than their expensive TV oracles E21IT81 to solve the SFM and TV denoising problems of 
F given by Eq. (f2| and Eq. (|T]» respectively. An algorithm was proposed by ll24l to search over partition space 
for solving Eq. (TTh with the unary terms (—it 1 w) replaced by a convex differentiable function but restricting 
to cut functions. 

In this paper, we exploit the polytope structure of these non-smooth optimization problems, where each face 
is indexed by an ordered partition of the underlying ground set V = {1,..., n}. The main insight of this 
paper is that once given a face of the main polytope associated with the submodular function (namely the 
base polytope described in Section]^ and its tangent cone, orthogonal projections may be done in linear time 
by isotonic regressions. We will only need SFM oracles, i.e., the minimization of F(A) — s(A) with respect 
toACF for all possible s £ R™, to check optimality of this partition and/or generate a new partition. 

Contributions. We make two main contributions: 

— Given a submodular function F with an SFM oracle, we propose a new active-set algorithm for total 
variation denoising in Section[3] which is more efficient and flexible than existing ones (e.g., it allows 
warm restarts). This algorithm may be seen as a local descent algorithm over ordered partitions. 

— Given a decomposition of F = h\ + hg, with available SFM oracles for each Fj, we propose an active- 
set algorithm for total variation denoising in Section[4](and hence for SFM by thresholding the solution 
at zero). These algorithms optimize over ordered partitions (one per function Fj). Following USED, 
they are also naturally parallelizable. Given that only SFM oracles are needed, it is much more flexible, 
and allow more applications as shown in Section[5] 


2 Review of Submodular Analysis 

A set-function F : 2 V —)-Ris submodular if F(A) +F(B) ^ F(AUB) + F(AnB) for any subsets A, B of 
V. Our main motivating examples in this paper are cuts in a weighted undirected graph with weight function 
a : V x V —» R+, which can be defined as 

F(A) = Y,a(iJ)\lieA-ljtAl (3) 

i<j 

where i,j £ V. Note that there are other submodular functions on which our algorithm works, e.g., concave 
function on the cardinality set, which cannot be represented in the form of Eq. 0 ED E9). However, we 
use cut functions as a running example to better explain our algorithm as they are most widely studied and 
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Figure 1: Base polytope for n = 3. (a) definition from its supporting hyperplanes {s(A) = F(A)}. (b) each 
face (point or segment) of B(F) is associated with an ordered partition. 


understood among submodular functions. We now review the relevant concepts from submodular analysis 
(for more details, see Emu). 

Lovdsz extension and convexity. The power set 2 V is naturally identified with the vertices {0,1}” of the 
hypercube in n dimensions (going from A C V to 1 4 £ {0,1}”). Thus, any set-function may be seen as a 
function / on {0, l} n . It turns out that / may be extended to the full hypercube [0, l] n (and then to R ra ) by 
piecewise-linear interpolation, and then to the whole vector space R". 

Given a vector w £ R", and given its unique level-set representation as w = YJiLx v i^Ai, with (Ai,... ,A m ) 
a partition of V and V\ > ■ ■ ■ > v m , f(w ) is equal to f(w) = YhL 1 v % ~ F(Bi- 1 )], where B t = 

(A\ IJ ... U A, ). For cut functions, the Lovasz extension happens to be equal to the total variation , f(w) = 
Y%< :j a (h j)\ w i ~ w j |, hence our denomination total variation denoising for the problem in Eq. (JTji. 

This extension is piecewise linear for any set-function F. It turns out that it is convex if and only if F is 
submodular l26l . Any piecewise linear convex function may be represented as the support function of a 
certain polytope K, i.e., as f(w) = niax s(= K w 1 .s E281 . For the Lovasz extension of a submodular function, 
we have an explicit description of K, which we now review. 

Base polytope. We define the base polytope as 

B(F) = {seR n , s{V) = F(V), VA c V,s(A) ^ F(A)}. 

Given that it is included in the affine hyperplane { s(V) = F(V)}, it is traditionally represented projected 
on that hyperplane (see Figure[I|(a)). A key result in submodular analysis is that the Lovasz extension is the 
support function of B(F), that is, for any w £ R", 

f(w) = sup w T s. (4) 

sGB(F) 

The maximizers above may be computed in closed form from an ordered level-set representation of w using 
a greedy algorithm, which (a) first sorts the elements of w in decreasing order such that > ... > 

w tr(n) where er represents the order of the elements in V; and (b) computes = i r ({o'(l),..., cr(fc)}) — 
F({a(l),...,a(k-1)}). 

SFM as a convex optimization problem. Another key result of submodular analysis is that minimizing a 
submodular function F (i.e., minimizing the Lovasz extension / on {0,1}"), is equivalent to minimizing the 
Lovasz extension / on the full hypercube [0,1]" (a convex optimization problem). Moreover, with convex 
duality we have 

min F(A) — u(A ) = min f(w) — u T w 

AC.V uiG{0,l} n 
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min f(w) — u T w 

™e[o,i]" 

min max s T w — u T w 

iue[0,l]" s£B(F) 

max min s T w — u T iv 

s£B(F) lueJO,!]” 


= max > — Ui, 0}. 


This dual problem allows to obtain certificates of optimality for the primal-dual pairs w £ [0, l] n and s £ 
B{F ) using the quantity 

T n 

gap(w, s) := f(w) - u w - min{sj - u t , 0}, 

which is always non-negative. It is equal to zero only at optimal and the corresponding (w, s ) form the 
optimal primal-dual pairs. 

Total variation denoising as projection onto the base polytope. A consequence of the representation of / as 
a support function leads to the following primal/dual pair (2) Sec. 8]: 


min f(w) — u T w+b\\w\\% 
™e»" 2 


min max s T w — u T w 

sGB(F) 


+ \||will using Eq. 0. 


max min s T w — u T w + ill-ufilo, 
s eB(F)«>ei" 211 112 


max — ills — wllo, 

seB(F) 211 112 


(5) 


with w = u — s at optimality. Thus the TV problem is equivalent to the orthogonal projection of u onto 
B(F). 

From TV denoising to SFM. The SFM problem in Eq. 0 and the TV problem in Eq. 0 are tightly connected. 
Indeed, given the unique solution w of the TV problem, then we obtain a solution of min^cv F(A) — u{A) 
by thresholding w at 0, i.e., by taking A = {i £ V, Wi ^ 0} fl3l . 

Conversely, one may solve the TV problem by an appropriate sequence of SFM problems. The original 
divide-and-conquer algorithm may involve O(n) SFM problems JT7). The extended algorithm of lfl8l can 
reach a precision e in 0(log j) but can only get the exact solution in O(n) oracles. Fast efficient algorithms 
are proposed to solve TV problems with 0(1) oracles ITT! fl6 1 but are specific to cut functions on simple 
graphs (chains and trees) as they exploit its weight representation given by Eq. 0. Our algorithm in Section[3] 
is a generalization of the divide-and-conquer strategy for solving the TV problem with general submodular 
functions. 


3 Ordered Partitions and Isotonic Regression 

The main insight of this paper is (a) to consider the detailed face structure of the base poly tope B(F) and (b) 
to notice that for the outer approximation of B(F) based on the tangent cone to a certain face, the orthogonal 
projection problem (which is equivalent to constrained TV denoising) may be solved efficiently using a 
simple algorithm, originally proposed to solve isotonic regression in linear time. This allows an explicit 
efficient local search over ordered partitions. 


4 




Figure 2: Projection algorithm for a single polytope: first projecting on the outer approximation 
Bd 2 ’ 3 }>{i})(F), with a projected element which is not in B(F) (blue), then on _b(( 2 F(3},{i}) w ith a 
projected element being the projection of s onto B{F ) (red). 


3.1 Outer approximations of B(F) 


Supporting hyperplanes. The base polytope is defined as the intersection of half-spaces { s(A ) ^ F(A)}, 
for A C V. Therefore, faces of B(F) are indexed by subsets of the power set. As a consequence of 
submodularity EH3, the faces of the base poly tope B{F) are characterized by “ordered partitions” A = 
(Ai ,..., A m ) with V = A\ U • • • U A m . Then, a face of B(F ) is such that s(Bf) = F(Bi) for all Bi = 
A\ U • • • U Ai, i = 1 ,m. See the Figure |T] (b) for the enumeration of faces for n = 3 based on an 
enumeration of all ordered partitions. Such ordered partitions are associated to vectors w = YhLi vAa, 
with V\ > ... > v m with all solutions of max s(=/i( - F ~ t w T s being on the corresponding face. 

From a face of B(F) defined by the ordered partition A, we may define its tangent cone B A (F) at this face 
as the set 

B a (F) = js G r\ a(V) = F(V),Wi g {1,..., m - 1 }, s(Bi) < F(^) J. (6) 

These are outer approximations of B(F), as illustrated in Figure |2]for two ordered partitions. 

Support function. We may compute the support function of B A (F), which should be an upper bound on 
f(w) since this set is an outer approximation of B(F) as follows: 


sup w T s 

sGB^iF) 


sup inf w T s - Y' A i(s(Bi) - F(Bi)) 

seR"Ae»™ _1 xK 

(using Lagrangian duality), 

m 

inf sup s T (w - y^(\i H-hA m )lA,) 

AeR+ 1 - 1 xRseR" v 7 

m 

+ + • • • + A m ) [ F(Bi ) — F(Bi- 1)], 

i= 1 

m 

inf V(A, + • • • + A m ) [F(Bi) - F(Bi-i)] 
xeRT xR : 

+ i=i 

m 

such that w = y^(Ai+• • •+ x m )iAi- 

i=i 


5 








Thus, by defining Vi = A, + ■ ■ ■ + A m , which are decreasing, the support function is finite for w having 
ordered level sets corresponding to the ordered partition A (we then say that w is compatible with A), i.e., 
w = Y^iLi Vil Ai', the support functions is equal to the Lovasz extension f(w). Otherwise, when w is not 
compatible with A, the support function is infinite. 

Let us now denote W A as a set of all weight vectors w that are compatible with the ordered partition A. This 
can be defined as 


Therefore, 


m 

W A = {w £ R ra | 3v £ R m ,w = > ... > v m }. 

»=1 


sup u) T s = 



if to e W A , 
otherwise. 


(7) 


3.2 Isotonic regression for restricted problems 

Given an ordered partition A = (A t ...., A m ) of V, we consider the original TV problem restricted to w in 
W A . Since on this constraint set f(w) = YiiiLi v i [-F(-Bi) — F{B i _x) 1 \ is a linear function, this is equivalent 
to 

m 

min ^2 Vi [F(Bi) - F(B i _ 1 ) - w(A, ; )] + \ Yh= i I A i\ v i such that v i > ■ ■ ■ > v m . (8) 

V i=l 

This may be done by isotonic regression in complexity 0(m) by the weighted pool-adjacent-violator algo¬ 
rithm Q- Typically the solution v will have some values that are equal to each other, which corresponds 
to merging some sets A, . If these merges are made, we now obtain a basic ordered partitioi^J such that 
our optimal w has strictly decreasing values. Primal stationarity leads to explicit values of v given by 
Vi = u(Ai)/\Ai\ — ( F(Bi ) — F(Bi-i))/\Ai\, i.e., given A, the exact solution of the TV problem may 
be obtained in closed form. 

Dual interpretation. Eq. (|8]» is a constrained TV denoising problem that minimises the cost function in Eq. <jT]) 
but with the constraint that weights are compatible with the ordered partition A, i.e. mm we w A f(w)—u T w+ 
h IMII- The dual of the problem can be derived exactly the same way as shown in Eq. (pi in the previous 
section, using the definition of the support function defined by Eq. ([7]). The corresponding dual is given by 
max sg g. 4 ^ —71 |s — u|||, with the relationship w = u — s at optimality. Thus, this corresponds to projecting 

u on the outer approximation of the base poly tope, B A (F ), which only has m constraints instead of the 2" — 1 
constraints defining B(F). See an illustration in Figure [ 2 ] 


3.3 Checking optimality of a basic ordered partition 

Given a basic ordered partition A, the associated w £ R n is optimal for the TV problem in Eq. (|T]> if and 
only if s = u — w £ B(F) due to optimality conditions in Eq. (J5ji, which can be checked by minimizing the 
submodular function F—s. For a basic partition, a more efficient algorithm is available. 

By repeated application of submodularity, we have for all sets C C V, if Cj = C (T A,: 

1 Given a submodular function F and an ordered partition A, when the unique solution problem in Eq. JIJ is such that r i > ■ ■ ■ > v rn , 
we say that we A is a basic ordered partition for F — u. Given any ordered partition, isotonic regression allows to compute a coarser 
partition (obtained by partially merging some sets) which is basic. 
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F(C) - s(C) 


= F(V D C) — s(Ci) (as s is a modular function), 

2=1 
m 

= F(B m n C) - ^ a(Ci) 

m —1 2 — 1 

+ ^ nc)- F(Bi n C) (as B m = V), 

2=1 
m 

= J2 F ^ n C)-F(B i _ 1 nC)-s(C i ) 
i ~ 1 (let Bo = 0 and as F(0) = 0), 

m 

= J2 u n c ) - w-in c) - *(<3) 

i=1 (since Bi = Bi_\ U A,), 

m 

= J2 n c) u (A, n c)) - F(Bi_i n c) - s(a), 

2=1 
m 

= J2 F (( B *-1 n c) u Ci) - F(Bi-i nc) - s(Ci), 

2=1 
m 

> ^2 [F( B i-! U Cj) - F(B i ^ 1 ) - s(C'j)] 

t=i 

(as D C) C Bi_ i and due to submodularity of i 7 ). 

Moreover, we have s(A,) = F(Bi) — F(Bi_ i), which implies s(f?i) = F(Bi) for all i £ {1,..., to}, and 
thus all subproblems min^cAi F(Bi -i U Cj) — F(Bi- 1 ) — s(C'j) have non-positive values. This implies 
that we may check optimality by solving these to subproblems: s is optimal if and only if all of them have 
zero values. This leads to smaller subproblems whose overall complexity is less than a single SFM oracle 
call. Moreover, for cut functions, it may be solved by a single oracle call on a graph where some edges have 
been removed ED- 

Given all sets C}, we may then define a new ordered partition by splitting all /l, for which t U C,) — 

F(Bi_ i) — s(Ci) < 0. If no split is possible, the pair (w,s) is optimal for Eq. (JTJ. Otherwise, this new 
strictly finer partition may not be basic, the value of the optimization problem in Eq. 0 is strictly lower as 
shown in Section [375] (and leads to another basic ordered partition), which ensures the finite convergence of 
the algorithm. 


3.4 Active-set algorithm 

This leads to the novel active-set algorithm below. 

— Input: Submodular function F with SFM oracle, u £ EE", ordered partition A 

— Algorithm: iterate until convergence 

(a) Solve Eq. 0 by isotonic regression. 

(b) Merge the sets with equal values of v t to define a new ordered partition A. Define w = J}l=r Vl 1 A * 
and s = u — w. 

(c) Check optimality by solving min^cA; i'X-Bj-i U Ci) — F(Bi-\) — s(Ci) for i £ to}. 
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(d) If s not optimal, for all (7, which are different from 0 and A ,, add the new set B, _ i U C, in the 
ordered partition A. 

— Output: w £ R” and s € B(F). 

Relationship with divide-and-conquer algorithm. When starting from the trivial ordered partition A = (V), 
then we exactly obtain a parallel version of the divide-and-conquer algorithm tm that is, the isotonic regres¬ 
sion problem in (a) is always solved without using the constraints of monotonicity, i.e., there are no merges 
in (b), and thus in (c), it is not necessary to re-solve the problems where nothing has changed. This shows 
that the number of iterations is then less than n. The key added benefits in our formulation is the possibility 
of warm-starting, which can be very useful for building paths of solutions with different weights on the total 
variation. This is also useful for decomposable functions where many TV oracles are needed with close-by 
inputs. See experiments in Section[5] 


3.5 Proof of convergence 


In order to prove convergence of the algorithm, we only need to show that if the optimality check fails in step 
(c), then step (d) introduces splits in the partition, which ensures that the isotonic regression in step (a) of the 
next iteration has a strictly lower value. Let us recall the isotonic regression problem solved in step (a): 


min 

tieR m 



F(B l _ 1 ) - u(A0] 



such that V\ ^ ^ v m . 


(9) 

( 10 ) 


Steps (a-b) ensures that the ordered partition A is a basic ordered partition warranting that the inequality 
constraints are strict, i.e., no two partitions have the same value Vi and the values v t for each element of the 
partition * = m} is given through 


Vi\Ai\ = u(Ai) - (F(Bi) - F(Bi-i)), 


( 11 ) 


which can be calculated in closed form. 

The optimality check in step (c) decouples into checking the optimality in each subproblem as shown in 
Section [O] If the optimality test fails, then there is a subset of C, of A, for some of elements of the partition 
A such that F{Bi_\ UG,) — F(Bi_i) — s{Ci) < 0. We will show that the splits introduced by step fd) strictly 
reduces the function value of isotonic regression in Eq. & while maintaining the feasibility of the problem. 
The splits modify the cost function of the isotonic regression as follows, as the objective function in Eq. 0 
is equal to 


Y, ( v i [W-i u ) - HBi-i) - u{Ci)] + Vi [ F(B t ) - FiB^ U (?<) - u{Ai \ Q)] 

i =1 ^ 

+ ^ y f\C,\ + ^ v f\A i \C i \ S j. ( 12 ) 

Let us assume a positive t £ R, which is small enough. The direction that the isotonic regression moves is 
■(!,; + / for the partition corresponding to C, and v, -1 for the partition corresponding to A, \ Ci maintaining 
the feasibility of the isotonic regression problem, i.e., v\ ^ ^ Vi + t > Vi — t ^ ^ v m . The function 

value is given by 
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53 (( w < + *) [ F ( B i- 1 u - W-i) - u(C'i)] + («i - *) [-F’(-Bi) - F(Bi_ i U Ci) - u(Ai \ C*)] 

2=1 ' 

+ 5(^1 + 0 2 |Cj| + \{ v i ~ t) 2 \A.i \ C^ 

m , 

= 53 ( («i [F(Bi-i U Ci) - F(Bi-i) - u(Ci )] + Vi [F(Bi) - F(B i _ 1 U Ci) - u(A; \ Ci)] 

2=1 ' 

+ \v 2 \ c i\ + \Ci\) 

+t (2F(Bi_! U Q) - F(Bi_i) - F(Bi) - u(C<) + u{At \ Cf) + Vi \Ci\ - Vi\A t \ C t |) 
+\t 2 \Ai\^j. 


From this we can compute the directional derivative of the function at t = 0, which is given by 

2 F(Bi-i U Q) - F(Bi-i) - F{Bi) - u(Ci) + u(A z \ C z ) + | Ci\ Vi - |A 4 \ C t \vi 
= 2U Ct) - F(Bi^) - F{Bi) - 2u{C l ) + u(A t ) + 2|C , i |« i - | A\ Vi 
= 2 (F(Bi^i U Ci) - F{Bi_ 1 ) - u(Ci) + Vi\Ci\) (substituting Eq. ( |TT| ) 

= 2(F(B*_i U Ci) - F(Bi_ 1 ) - s(Ci )) < 0 (as s = u - w and Eq. (g3)). 

This shows that the function strictly decreases with the splits introduced in step (d). 


3.6 Discussion 


Certificates of optimality. The algorithm has dual-infeasible iterates s (they only belong to B{F) at con¬ 
vergence). However, after step (c), we have that for all C C V, F(C) — s{C) f —e. This implies that 
s € B{F + elcarde(i,n))> ie., s €= B {F e ) with F e = F + elcarde(i.n)- Since by construction w = u — s, 
we have: 


f e (u>) - u T w + 3 IMI 2 + Ills - u\\l 


el maxTOj — min Wj I + f(w) — u T w + ||m|| 2 
jev jev 

e\ maxtt/i — minic, 

1 jev J jev J1 


m m 

+ 53 Vi [F{Bi) - F(Bi^) - u(Ai)\ + J2 I Ai\v\ 

i— 1 t= 1 


£ maxru, — min to,- 
1 jev jev 


(using Eq. ( [TT] )) 


£ range(w), 


where range(«;) = ma Xk^v Wk — txim.kev w k- This means that w is approximately optimal for f(w) — 
u T w + ^||w||| with certified gap less than £range(w) + £ range (w*). 

Maximal range of an active-set solution. For any ordered partition A, and the optimal value of w (which we 
know in closed form), we have range(w) ^ range(rt) + max^gy {.F({i}) + T"(V r \{i}) — F(V)}. Indeed, 
for the u part of the expression, this is because values of w are averages of values of u\ for the F part of the 
expression, we always have by submodularity: 


F(Bi)-F(Bi_ r) < 53 F({k}) and 

kaAi 
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F{Bi) - F(Bi-i) A - J2 F iV) - F(V\{k}). 

keAi 

This means that the certificate can be used in practice by replacing range(w*) by its upperbound. 

Exact solution. If the submodular function only takes integer values and we have an approximate solution of 
the TV problem with gap e ^ -F, then we have the optimal solution llOl . 

Relationship with traditional active-set algorithm. Given an ordered partition A. an active-set method solves 
the unconstrained optimization problem in Eq. (|p) to obtain a value of v using the primary stationary condi¬ 
tions. The corresponding primal value w = JV =1 W *1 A t and dual value s = u — w are optimal, if and only 
if. 


Primal feasibility : w £ W A , (13) 

Dual feasibility : s £ B(F). (14) 

If Eq. ( p~3] > is not satisfied, a move towards the optimal w is performed to ensure primal feasibility by per¬ 
forming line search, i.e., two consecutive sets A, and A, + \ with increasing values of v, i.e., v t < v 2+ i are 
merged and a potential w is computed until primal feasibility is met. Then dual feasibility is checked and 
potential splits are proposed. 

In our approach, we consider a different strategy which is more direct and does many merges simultaneously 
by using isotonic regression. Our method explicitly moves from ordered partitions to ordered partitions and 
computes an optimal vector w, which is always feasible. 


4 Decomposable Problems 

Many interesting problems in signal processing and computer vision naturally involve submodular functions 
F that decompose into F = F\ + ■ • ■ + F r , with r “simple” submodular functions CD- For example, a cut 
function in a 2D grid decomposes into a function F\ composed of cuts along vertical lines and a function F 2 
composed of cuts along horizontal lines. For both of these functions, SFM oracles may be solved in 0(n) by 
message passing. For simplicity, in this paper, we consider the case r = 2 functions, but following 112011181 . 
our framework easily extends to r > 2. 


4.1 Reformulation as the distance between two poly topes 

Following m. we have the primal/dual problems : 


min fi{w) + f 2 (w) - u T w + \ \ 


'LUG® 


w llo = min max ■u; T (s 1 + s 2 ) — u T w + ^ llwllo 

1 «eK" Sl GB(F 1 ), s 2 GB(F 2 ) 2 " 


max 

SiEB(Fi) , S2^B(F2) 


min (si + s 2 — u) T w + §|M |2 


max 

SlGS(Fi), S2^lB{F2) 


-ij||Sl+S 2 -u||!, 


(15) 


with w = u — s i — s 2 at optimality. 

This is the projection of u on the sum of the base polytopes B(F\) 4- B(F 2 ) = B(F). Further, this may be 
interpreted as finding the distance between two polytopes B(Fi) — u/2 and u/2 — B(F 2 ). Note that these 
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Figure 3: Closest point between two polytopes. Top: output of Dykstra’s alternating projection algorithm for 
the TV problem, the pair (si, s 2 ) may not be unique while w = S] + u is. Bottom: Dykstra’s alternating 
projection output for outer approximations. 


two polytopes typically do not intersect (they will if and only if w = 0 is the optimal solution of the TV 
problem, which is an uninteresting situation). 


Alternating projections (AP). The alternating projection algorithm j4j was proposed to solve the convex 
feasibility problem, i.e., to obtain a feasible point in the intersection of two polytopes. It is equivalent to 
performing block coordinate descent on the dual derived in Eq. (15 1 . Let us denote the projection onto a 
polytope K as II k, he., Yixijj) = argmax Igif —||a; — t/|||. Therefore, alternating projections lead to the 
following updates for our problem. 


Zt — ^u/2-B(F 2 )^B(F 1 )-u/2{Zt-l), 

where z 0 is an arbitrary starting point. Thus each of these steps require TV oracle for f j and F 2 since 
projection onto the base polytope is equivalent to performing TV denoising as shown in Eq. ([5]). 

Averaged alternating reflections (AAR). The averaged alternating reflection algorithm 0 , which is also 
known as Douglas-Rachford splitting can be used to solve convex feasibility problems. It is observed to 
converge quicker than alternating projection |T8l [221 in practice. We now introduce a reflection operator for 
the polytope K as Rk, i.e., Rk = 211^- — /, where I is the identity operator. Therefore, reflection of t on 
a polytope K is given by Rx{t) = 211 ^(t) — t. The updates of each iteration of the averaged alternating 
reflections, which starts with an auxiliary sequence Zo initialized to 0 vector, are given by 

z t = + R u /2-B(F 2 )^B(F 1 )-u/2)( z t-l)- 

In the feasible case, i.e., intersecting polytopes, the sequence z t weakly converges to a point in the intersection 
of the polytopes. However, in our case, we have non intersecting polytopes which leads to a converging 
sequence of z t with AP but a diverging sequence of z t with AAR. However, when we project z t by using the 
following projection operation, i.e, s M = H B (F 1 )-u/ 2 {z t ); s 2 ,t = H u /2- B (F 2 )( s i,t) the sequences si jt and 
s 2 t converge to nearest points on the polytopes, B(Fi) — u/ 2 and u/2 — B(F 2 ) 0. 

Dykstra’s alternating projections (DAP). Dykstra’s alternating projection algorithm [0 retrieves a convex 
feasible point closest to an arbitrary point, which we assume to be 0. It can also be used and has a form of 
primal descent interpretation, i.e., as coordinate descent for a well-formulated primal problem m. Let us 
denote lk as the indicator function of a convex set K. In our case we consider finding the nearest points on 
the polytopes B(Fi) — u/2 and u/2 — B(F 2 ) closest to 0, which can be formally written as: 


min 

s£B(Fi)— § 
7^ — 5(-F 2 ) 



2 

2 


min -HI* + i B (p’ 1 )_|(s) + l»_ B (f 2 )(s) 
min l[[s||| + l B ( F i )-% (s) + l B (f 2 )~ |(-s) 
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• 1 II II2 T n / x wJu T , , . wJ\ 

mm d|s|| 2 + max w 1 s - /i(wi) + —- + max -w 2 s - f 2 {w 2 ) + — 
sGK 71 2, tH^GM 71 ,Z iu 2 G M 71 

nr \ , , . (wi+W 2 ) T W . 1 || 1,2 / xT 

max -/i wi - f 2 (w 2 ) + - - -+ mm - s 2 + {wi - w 2 ) s 

w iGM Z sGM Z 

W2 GM 71 

r ( X , / X . (u>l +W 2 ) T U 1 2 

max -/i(wi) - f 2 {w 2 ) + ----||wi - w 2 || 2 

w iGM Z Z 

W 2 GlK 71 

min /i(wi) + / 2 (w 2 ) - + - w 2 || 2 , 

w iGM. Z Z 

iz;2 GM n 


where s = w 2 — w\ at optimality. The block coordinate descent gives 


w l,t — P rox f 1 - u /2( w 2,t-l) — {I - ^-B(F 1 )-u/l){w 2 ,t-\), 
s l,t = W 2 t t-1-W lt t, 

w 2 ,t = prox /2 _ u/2 (wi,t) = (/-n B(F2) _ u/2 )(wi )t ), 

82 ,t = w l,t - W 2 ,u 


where I is the identity matrix. This is exactly the same as Dykstra’s alternating projection steps. 

We have implemented it, and it behaves similar to alternating projections, but it still requires TV oracles for 
projection (see experiments in Section [5j. There is however a key difference: while alternating projections 
and alternating reflections always converge to a pair of closest points, Dykstra’s alternating projection algo¬ 
rithm converges to a specific pair of points, namely the pair closest to the initialization of the algorithm 0; 
see an illustration in Figure[3](top). This insight will be key in our algorithm to avoid cycling. 

Assuming TV oracles are available for F\ and F 2 , |T8]f22| use alternating projection |[4] and alternating re¬ 
flection 0 algorithms. However, these algorithms are equivalent to block dual coordinate descent and cannot 
be be cast explicitly as descent algorithms for the primal TV problem. On the other hand, Dykstra’s alternat¬ 
ing projection is a descent algorithm on the primal, which enables local search over partitions. Complex TV 
oracles are often implemented by using SFM oracles recursively with the divide-and-conquer strategy on the 
individual functions. Using our algorithm in Section [3A| they can be made more efficient using warmstarts 
(see experiments in Section[5]). 


4.2 Local search over partitions using active-set method 


Given our algorithm for a single function, it is natural to perform a local search over two partitions A\ and 
A 2 , one for each function Fj and F 2 , and consider in the primal formulation a weight vector w compatible 
with both Ai and A 2 ; or, equivalently, in the dual formulation, two outer approximations B A] (FA and 
U A ' 2 (F 2 ). That is, given the ordered partitions A \ and A 2 , using a similar derivation as in Eq. (15 I, we 
obtain the primal/dual pairs of optimization problems 


max -|||u - si - S2II2 = /i( w ) + ~ u w + IlMli 

s^B a i(Fi) wew A 2 

s 2 gb- A2 (f 2 ) 


with w = u — s 1 — s 2 at optimality. 


Primal solution by isotonic regression. The primal solution w is unique by strong convexity. Moreover, 
it has to be compatible with both A \ and A 2 , which is equivalent to being compatible with the coalesced 
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ordered partition A = coalesce (_4|. _4 2 ) defined as the coarsest ordered partition compatible by both. As 
shown in Appendix |A| A may be found in time 0(min(mi, m 2 )n). 


Given A, the primal solution w of the subproblem may be found by isotonic regression like in Section 3.2 


time 0(m) where m is the number of sets in A. However, finding the optimal dual variables si and s 2 turns 
out to be more problematic. We know that si + s 2 = u — w and that si + s 2 £ B A {F), but the split of 
si + S 2 into (si, S 2 ) is unknown. 


Obtaining dual solutions. Given ordered partitions A\ and A 2 , a unique well-defined pair (si,S 2 ) can 
be obtained by using convex feasibility algorithms such as alternating projections |[4j or alternating reflec¬ 
tions 0. However, the result would depend in non understood ways on the initialization, and we have 
observed cycling of the active-set algorithm. Using Dykstra’s alternating projection algorithm allows us to 
converge to a unique well-defined pair (si, S 2 ) that will lead to a provably non-cycling algorithm. 

When running the Dykstra’s alternating projection algorithm starting from 0 on the polytopes B Al (f j) — 
u/2 and u/2 — B A2 {F 2 ), if w is the unique distance vector between the two polytopes, then the iterates 
converge to the projection of 0 onto the convex sets of elements in the two poly topes that achieve the minimum 
distance 0. See Figure [3] (bottom) for an illustration. This algorithm is however slow to converge when the 
polytopes do not intersect. Note that w ^ 0 in most of our situations and convergence is hard to monitor 
because primal iterates of the Dykstra’s alternating projection diverge 0. 


Translated intersecting polytopes. In our situation, we want to reach the solution while knowing the vector 
w (as mentioned earlier, it is obtained cheaply from isotonic regression). Indeed, from Lemma 2.2 and 
Theorem 3.8 from 0, given this vector w , we may translate the two polytopes and now obtain a formulation 
where the two polytopes do intersect; that is we aim at projecting 0 on the (non-empty) intersection of 
B a ' (F [) — u/2 + w/2 and u/2 — w/2 — B A2 (F 2 ). See Figure [dj We also refer to this as the translated 
Dykstra problen ^ in the rest of the paper. This is equivalent to solving the following optimization problem 


ses Al (-F\)-s=» 

se=-B A 2(F 2 ) 


fl|S|l2 


= 511*111 + t S A l{Fl) _^(s) + tu-u,_ 


se 


-B a 2(F 2 ) 


(s), 


2 + L B- A 2 (F 2 )- li 2 Y !i ( S )’ 

= m i n f IlMli + otax^igw-Ai wjs - + Wl ( 2 ~ w) 

T r / \ W 2 ( u ~ w ) \ 

+ max -W2 s - / 2 (w2) H-x- , 

w 2 €W a 2 Z ) 

f ,, \ ft \ +w 2 ) t (u- w) 

= max - /i(wi) - } 2 {w 2 ) +- - - 

mi€W a i \ Z 

W2 ew A z ^ s 

+ min 9 Il s ll2 + ( w i -W2 ) T s), 

sGM 71 Z ) 


max -/i(mi) -/ 2 (w 2 )+ 
Wl ew Al \ 

w 2 GW a 2 ^ 


(wi + w 2 ) T (m ~ w) 
2 


--HWi - w 2H2 ], 


2 We refer to finding a Dykstra solution for translated intersecting poly topes as translated Dykstra problem 
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Figure 4: Translated intersecting polytopes. Top: output of our algorithm before translation. Bottom: 
Translated formulation. 


ft , s . t ( \ (wi+w 2 ) t (u 

mm /i(wi) + } 2 (w 2 ) - ~ - 

GW- 4 ! \ 2 

GW 4 2 

+2^1 - Ml ), 


w) 


(16) 


with s = w 2 — Wi at optimality. 


In Section 4.3 we propose algorithms to solve the above optimization problems. Assuming that we are able 
to solve this step efficiently, we now present our active-set algorithm for decomposable functions below. 


4.3 Active-set algorithm for decomposable functions 

— Input: Submodular function Fj and F 2 with SFM oracles, u £ W , ordered partitions A\. A 2 

— Algorithm: iterate until convergence (i.e., s i + e 2 small enough) 

(a) Find A = coalesce(Ai, A 2 ) and run isotonic regression to minimize f(w) — u T u> + \ ||u>||| such 
that w is compatible with A. 

(b) Find the projection of 0 onto the intersection of B Al (Fi) — u/2 + w/2 and u/2 — w/2 — B A2 (F 2 ) 
using any of the algorithms described in Section [4~4| 

(c) Merge the sets in Aj which are tight for sj, j £ {1, 2}. 

fd) Check optimality by solving mm Cjti .<z A jtij Fj(B jtij -1 U C jti .) - F J (B j ^ i+J _ 1 ) - Sj(C jtij ) for 
ij £ {1,..., nij}. Monitor £\ and e 2 such that Fj(Cj) — sj(Cj) ^ —Sj, j = 1,2. 

(e) If both si and s 2 not optimal, for all C ]tl . which are different from 0 and Aj i , split partitions. 

— Output: w £ M” and Si £ B(Fi), s 2 £ B(F 2 ). 

Given two ordered partitions A\ and A 2 , we obtain si £ B Al (F 1 ) and s 2 £ B A2 (F 2 ) as described in the 
following section. The solution w = u — si — s 2 is optimal if and only if both si £ B{F\) and s 2 £ B(F 2 ). 
When checking the optimality described in Section [T3] we split the partition. As shown in Appendix[B] either 
(a) ||tu||| strictly increases at each iteration, or (b) ||w ||2 remains constant but ||si — S 2 ||i strictly increases. 
This implies that the algorithm is finitely convergent. 
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4.4 Optimizing the “translated Dykstra problem” 


In this section, we describe algorithms to solve step (b) of the active-set algorithm proposed in Section 4.3 
that optimizes the translated Dykstra problem in Eq. ®,i-e-, 


min /i(wi) + f 2 (w 2 ) - (mi+W2 ^ w) + f||wi - 


w 2 


miEW-' 

The corresponding dual optimization problem is given by 


seB' 
sG — o 


u --B- a ^(F 2 ) 


(17) 


(18) 


with the optimality condition s = w 2 — wi. Note that the only link to submodularity is that f\ and f 2 are 
linear functions on W A ' and Yv A ' 2 , respectively. The rest of this section primarily deals with optimizing a 
quadratic program and we present two algorithms in Section 4.4.1 and Section|4.4.2 


4.4.1 Accelerated Dykstra’s algorithm 

In this section, we find the projection of the origin onto the intersection of the translated base polytopes 
obtained by solving the optimization problem in Eq. © given by 

min fi{wi) + f 2 (w 2 ) - iwi+W2 \ (u ~ w) + |||ioi - w 2 \\ 2 , 
wi ew Al 
w 2 ew A2 

using Dykstra’s alternating projection. It can be solved using the following Dykstra’s iterations: 


51, t = n SAl(Fi) {u/2 - w/2 + w 2}t -i), 
wi,t = u/2 - w/2 + w 2>t -i - si,t, 

5 2 , t = (^ 2 ) ( m /2 — w/2 + Wit), 

w 2 ,t = u/2-w/2 + w l j-s 2it , 

with II c denoting the orthogonal projection onto the sets G, solved here by isotonic regression. Note that 
the value of the auxiliary variable w 2 can be warm-started. The algorithm converges linearly for polyhedral 
sets (29]. 

In our simulations, we have used the recent accelerated version of G2. which led to faster convergence. In 
order to monitor convergence, we compute the value of \\u — w — s i ; t — S 2 ,t||i which is equal to zero at 
convergence. The optimization problem can also be decoupled into smaller optimization problems by using 
the knowledge of the face of the base polytopes on which s-| and s 2 lie. See details in the Appendix [D] This 
is still slow to converge in practice and therefore we present an active-set method in the next section. 


4.4.2 Primal active-set method 

In this section, we find the projection of the origin onto the intersection of the translated base polytopes given 
by Eq. ( [T6| using the standard active-set method (27l by solving a set of linear equations. For this purpose, 
we derive the equivalent optimization problems using equality constraints. 
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Aj tl A v A 3 ,3 ,4j,4 .42. 


Figure 5: Bipartite graph to estimate Q(Ai, A 2 ) with A\ having mi = 6 components and A 2 having m 2 = 5. 

The ordered partition, Aj is given by (A, 1 ,..., Aj >m .), where rn 3 is the number of elements in the ordered 
partitions. Let Bji. be defined as (A,,i U ... U Therefore, 


m i / v 

fj( w j ) = ^ v j,ij i)J 


(19) 


"■./ = X! 1 (20) 

*i =i 

with the constraints, Vj,i > ■ ■ ■ > v jtrnr (21) 

On substituting Eq. Eq. (|20|> and Eq. ([2T]> in Eq. (fl6|), we have and equivalent optimization problem: 


m 1 


mm 

V2,l>...>V 2 ,m 0 


v (fi(«,„.) - FiiBu.-.) - 


*1 = 1 
m 2 


* 2=1 

mi 


m 2 / 

+ ]T 




y 2,z 2 


'"-i -i 1 


n=l 

mi m 2 


* 2 =1 


2 1^ 2 .* 2 l V 2,* 2 


H J2 v a^ 2 i i Mi u 2 , i2 . 

i i=l i 2 = l 


This can be written as a quadratic program in x 


Vl 

V2 


with inequality constraints in the following form 


min -x T Q(Ai,A 2 )x + c(Ai 1 A 2 ) T x, (22) 

xe R m l+^2 l 
D(Ai,A2)x'$=0 

where D(Ai, A2) is a sparse matrix of size (mi + m 2 — 2) x (mi + m2), which is a block diagonal matrix 
containing the difference or first order derivative matrices of sizes mi — lx mi and m 2 — lx m 2 as the 
blocks and c(Ai, A 2 ) is a linear vector that can be estimated using the functions evaluations of F\ and F 2 . 
Note that these evaluations need to be done only once. 


Estimating Q(Ai, A 2 ). Let us consider a bipartite graph, G = (A\ . _4 2 , B), with mi + m 2 nodes repre¬ 
senting the ordered partitions of A\ and A 2 respectively. The weight of the edge between each element of 
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ordered partitions of Ai, represented by A t _ t l and each element of ordered partitions of A 2 , represented by 
A 2i2 is the number of elements of the ground set V that lie in both these partitions and can be written as 
e(Ai^ 1 , A 2i i 2 ) = 1^ 1 a 2 i 2 for all e £ E. The matrix Q(Ai,A 2 ) represents the Laplacian matrix of the 

graph G. Figure[5]shows a sample bipartite graph with mi = 6 and m 2 = 5. 

Optimizing the quadratic program in Eq. ( |22[ by using active-set methods is equivalent to finding the face 
of the constraint set on which the optimal solution lies. For this purpose, we need to be able to solve the 
quadratic program in Eq. ([22]) with equality constraints. 


Equality constraint QP. Let us now consider the following quadratic program with equality constraints 

min ^p T Q(Ai 7 A 2 )p+(Q(Ai,A 2 )x k +c(Ai,A 2 )) T p, (23) 

p£R m l+™2 Z 

D'p—O 


where D' is the subset of the constraints in l)(Ai. A 2 ), i.e. indices of its rows that are tight and Xk is a 
primal-feasible point. The vector p gives the direction of strict descent of the cost function in Eq. ( [22] ) from 
feasible point x k E3- 


Without loss of generality, let us assume that the equality constraints are Vj t k = Vj,kj+i f° r any kj in [0, nrij ). 
Let A'j be the new ordered partition formed by merging A h k } and Aj^+i as Vj^j = Vj.kj+J ■ Finding the 
optimal vector p using the quadratic program in Eq. (23 1 with respect to the ordered partition A'j is equivalent 
to solving the following unconstrained quadratic problem, 


Q(A[, A 2 , x t ) = min + (Q^A^A^Xt + c(A[,A 2 )) T p'\ (24) 

p'6l m l +ra 2 / 

where m' is the number of elements of the ordered partition A'j. This can be estimated by solving a linear 
system using conjugate gradient descent. The complexity of each iteration of the conjugate gradient is given 
by O^rn'i + m' 2 )k) where k is the number of non-zero elements in the sparse matrix, Q(A' 1i A' 2 ) ll32l . We 
can build p from p' by repeating the values for the elements of the partition that were merged. 


Primal active-set algorithm. We now describe the standard active-set method to solve the quadratic pro¬ 
gram in Eq. {16]. 


— Input: Laplacian matrix, Q(A \, A 2 ) and vector, c(_4 1 , A 2 ), ordered partitions Ai, A 2 , w. 

— Algorithm: Iterate on t until both primal and dual feasibility conditions in Eq. and Eq. © 
respectively are satisfied. 


— Initialize: xq using w. Working set IT’.S'o with tight sets of Aj. Estimate A \, A', from IVSq. 


(a) 

(b) 


Solve <2(21^, A 2 ,Xt) in Eq. (24 1 to find optimal p. 


Check Primal Feasibility: If p == 0 

1. Estimate Dual: 

A = D(Ai, A 2 )~ t (Q{Ai, A 2 )xt + c(Ai, A 2 )). 

2. Check Dual Feasibility: 

if Ai > 0 for all in i £ WSt 
* return Optimal x* = x t . 

3. else 
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Figure 6: (a) Number of SFM oracle calls for images of various sizes, (b) Time taken for images of various 
sizes, (c) Number of iterations with and without warm start, (d) Average complexity of the oracle with and 
without warm start. 


* Most violated Constraint: 

update j = argmin jeWSt Xj . 

* Update Working Set: 

update WSt+i = WS t \ {j}. 
update A'i , A' 2 from W S t +i. 

* update xt+i = x t 

* Goto step (a). 

4. endif 

(c) else 

1. Line Search: Find least a that retains feasibility of Xt+i = Xt + ap and find the blocking 
constraints B t . 

2. Update Working Set: WS t +\ = WS t U B t and update A\, A! 2 from WS t + 

3. Goto step (a), 
fd) endif 

— Output: x* e K mi+m2 . 


We can estimate w\ and w 2 from x* , which will enable us to estimate s feasible in Eq. ( 18 1 . Therefore we 
can estimate the dual variable si € B Al ( Fi ) and s 2 £ B A2 (F 2 ) using s. 


4.5 Decoupled problem 

In our context, the quadratic program in Eq. ( {22} can be decoupled into smaller optimization problems. Let us 
consider the bipartite graph G = (A \. A 2 . E ) of which Q is the Laplacian matrix. The number of connected 
components of the graph, G is equal to the number of levelsets of w. 

Let m be the total number of connected components in G. These connected components define a partition 
on the ground set V and a total order on elements of the partition can be obtained using the levels sets of 
w. Let k denote the index of each bipartite subgraph of G represented by Gk = (A \j c . A 2> k, Ef.), where 
k = 1,2 ,m. Let Jk denote the indices of the nodes of Gk in G. 

x*j k = argmin \x T Q{A l ,A 2 )j k j k x + (25) 

xeR ”»l,fc+™2,fc ^ 

D(Ai ,.42)fcx^=0 

where rrij^ is size of Aj k- Therefore, m\k + ’in 2 .k is the total number of nodes in the subgraph Gfc. Note 
that this is exactly equivalent to decomposition of base polytope of Fj into base polytopes of submodular 
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Figure 7: (a) Number of 2D SFM calls to obtain 3D SFM, (b) Number of 2D SFM calls to obtain 3D TV, (c) 
Number of 2D SFM calls to obtain SFM of 2D + concave function, (d) Number of 2D SFM calls to obtain 
TV of 2D + concave function. 


functions formed by contracting Fj on each individual components representing the connected component k. 
See Appendix |C| for more details. 


5 Experiments 


In this section, we show the results of the algorithms proposed on various problems. We first consider the 
problem of solving total variation denoising for a non decomposable function using active-set methods in 
Section 5.1 specifically cut functions. In Section [572] we consider cut functions on a 3D grid decomposed 
into a function of the 2D grid and a function of chains. We then consider a 2D grid and a concave function 
on cardinality, which is not a cut function. 


5.1 Non decomposable total variation denoising 

Our experiments consider images, which are 2-dimensional grids with 4-neighborhood. The dataset com¬ 
prises of 6 different images of varying sizes. We consider a large image of size 5616 x 3744 and recursively 
scale into a smaller image of half the width and half the height maintaining the aspect ratio. Therefore, 
the size of each image is four times smaller than the size of the previous image. We restrict to anisotropic 
uniform-weighted total variation to compare with Chambolle et al. GU but our algorithms works as well 
with weighted total variation, which is standard in computer vision, and on any graph with SFM oracles. 
Therefore, the unweighted total variation is 

f(w) = A K - 

where A is a regularizing constant for solving the total variation problem in Eq. (jTJ. 

Maxflow (§] is used as the SFM oracle for checking the optimality of the ordered partitions. Figure [bja) 
shows the number of SFM oracle calls required to solve the TV problem for images of various sizes. Note 
that in nm each SFM oracle call optimizes smaller problems sequentially, while each SFM oracle call in 
our method optimizes several independent smaller problems in parallel. Therefore, our method has lesser 
number of oracle calls than ifTTI . However, oracle complexity of each call is higher for the our method when 
compared to IfTTI . Figure [6|b) shows the time required for each of the methods to solve the TV problem to 
convergence. We have an optimized code and only use the oracle as plugin which takes about 80-85 percent 
of the running time. This is primarily the reason our approach takes more time than DU inspite of having 
lesser oracle calls for small images. 
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Figure[6jc) also shows the ability to warm start by using the output of a related problem, i.e., when computing 
the solution for several values of A (which is typical in practice). In this case, we use optimal ordered 
partitions of the problem with larger A to warm start the problem with smaller A. It can be observed that 
warm start of the algorithm requires lesser number of oracle calls to converge than using the initialization 
with trivial ordered partition. Warm start also largely helps in reducing the burden on the SFM oracle. With 
warm starts the number of ordered partitions does not change much over iterations. Hence, it suffices to 
query only ordered partitions that have changed. To analyze this we define oracle complexity as the ratio of 
pixels in the elements of the partitions that need to be queried with the full set. Oracle complexity is averaged 
over iterations to understand the average burden on the oracle per iteration. With warm starts this reduces 
drastically, which can be observed in Figure[6|d). 


5.2 Decomposable total variation denoising and SFM 


Cut functions. In the decomposable case, we consider a 3D-grid that decomposes into a function F\ com¬ 
posed of 2D grids and a function F 2 composed of chains. For each of these functions, the corresponding SFM 
oracle is a maxflow-mincut (8] and message passing algorithm on chains respectively. The corresponding 
projection algorithm can be solved by using the algorithm described in Section 3.4 We consider averaged 


alternating reflection (AAR) Q by solving each projection without warmstart and counting the total number 
of 2D SFM oracle calls to solve the SFM and TV on the 3D-grid as our baseline. (SGD-P) denotes the dual 
subgradient based method 1201 modified with Polyak’s a rule to solve SFM on the 3D-grid. We show the 
performance of alternating projection (AP-WS), averaged alternating reflection (AAR-WS) j5j and Dykstra’s 
alternating projection (DAP-WS) El using warmstart of each projection with the ordered partitions. WS 
denotes warmstart variant of each of the algorithm. The performance of the active-set algorithm proposed in 
Section |43l with inner loop solved using the primal active-set method proposed in Section |4.4.2| is represented 
by (ACTIVE). 


In our experiments, we consider the 3D volumetric dataset of the Stanford bunny m of size 102 x 100 x 
79. Fi denotes 102 2D frame of size 100 x 79 and F 2 represents the 100 x 79 = 7900 chains of length 
102. Figure [7] (a) and (b) show that (AP-WS), (AAR-WS), (DAP-WS) and (ACTIVE) require relatively less 
number of oracle calls when compared to when compared to AAR or SGD-P. Note that we count 2D SFM 
oracle calls as they are more expensive than the SFM oracles on chains. 


Time comparisons. We also performed time comparisons between the iterative methods and the combi¬ 
natorial methods on standard datasets. The standard mincut-maxflow f8] on the 3D volumetric dataset of the 
Standard bunny fT] of size 102 x 100 x 79 takes 0.11 seconds while averaged alternating reflections (AAR) 
without warmstart takes 0.38 seconds. The averaged alternating reflections with warmstart (AAR-WS) takes 
0.21 seconds and the active-set method (ACTIVE) takes 0.38 seconds. The main bottleneck in the active-set 
method is the inversion of the Laplacian matrix and it could considerably improve by using methods sug¬ 
gested in l32l . Note that the projection on base polytopes of f j and F 2 can be parallelized by projecting onto 
each of the 2D frame of Fj and each line of F 2 respectively ||22| . The times for (AAR), (AAR-WS) and (AC¬ 
TIVE) use parallel multicore architecture^] whi Ie the combinatorial algorithm only uses a single core. Note 
that cut functions on grid structures are only a small subclass of submodular functions with such efficient 
combinatorial algorithms. In constrast, our algorithm works on more general class of sum of submodular 
functions than just cut functions. 


Concave functions on cardinality. In this experiment we consider SFM problem of sum of a 2D cut 
on a graph of size 5616 x 3744 and a super pixel based concave function on cardinality f30l H8l . The 

3 20 core. Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz with lOOGigabytes of memory. We only use up to 16 cores of the machine 
to ensure accurate timings 
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unary potentials of each pixel is calculated using the Gaussian mixture model of the color features. The 
edge weight a(i,j) = exp(— 11— y.j || 2 ), where y., denotes the RGB values of the pixel i. In order to 
evaluate the concave function, regions Rj are extracted via superpixels and, for each Rj, defining the function 
F 2 (S) = |5||f?j \ S\. We use 200 and 500 regions. Figure [7] (c) and (d) shows that (AP-WS), (AAR- 
WS), (DAP-WS) and (ACTIVE) algorithms converge for solving TV quickly by using only SFM oracles and 
relatively less number of oracle calls. Note that we count 2D SFM oracle calls. 


6 Conclusion 

In this paper, we present an efficient active-set algroithm for optimizing quadratic losses regularized by 
Fovasz extesion of a submodular function using the SFM oracle of the function. We also present an active-set 
algorithms to minimize sum of “simple” submodular functions using SFM oracles of the individual “sim¬ 
ple” functions. We also show that these algorithms are competitive to the existing state-of-art algorithms to 
minimize submodular functions. 
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A Algorithms for coalescing partitions 


The basic interpretation in coalescing two ordered partitions is as follows. Given an ordered partition A\ and 
A 2 with mi and m 2 elements in the partitions respectively, we define for each j = 1, 2, Vi,- = (1,. .., rrij), 

Bj,ij = {Aj t 1 U ... U 

The inequalities defining the outer approximation of the base polytopes are given by hyperplanes defined by 

^*7 = (1) • • • > TO j)) s j{Bj,ij ) ^ 

. The hyperplanes defined by common sets of both these partitions, defines the coalesced ordered partitions. 
The following algorithm performs coalescing between these partitions. 

— Input: Ordered partitions A\ and A?. 

— Initialize: x = l, y=l, z = l and C = 0. 

— Algorithm: Iterate until x = m\ and y = m 2 with to = z 

(a) If \B l<x \ > \B 2 , y \ then y := y + 1. 

(b) If \Bi tX \ < \B 2 , y \ then x := x + 1. 

(c) If \B lyX \ == \B 2 , y \ then 

- If B l x == B> y then 

* A z = (B 1<x \ C), 

* C = B i :X , and 

* z := z + 1. 

— Output: to = z, ordered partitions A = (,4 |...., A rn ). 

B Optimality of algorithm for decomposable problems 

In step fd) of the algorithms, when we split partitions, the value of the primal/dual pair of optimization 
algorithms 


max -\\\u- si - S2II2 = mm wew A ' /i( w ) + h( w ) ~ u T w+ ^\\w\\l, 
Sl eB A HF 1 ) we w A i 

s 2 eB A HF 2 ) 


cannot increase. This is because, when splitting, the constraint set for the minimization problem only gets 
bigger. Since at optimality, we have w = u — s\ — s 2 , ||itt ||2 cannot decrease, which shows the first statement. 

Now, if || w || 2 remains constant after an iteration, then it has to be the same (and not only have the same 
norm), because the optimal si and s 2 can only move in the direction orthogonal to w. 

In step (b) of the algorithm, we project 0 on the (non-empty) intersection of B Al (Fi) — u/2 + w/2 and 
u/2 — w/2 — B A2 (F 2 ). This corresponds to minimizing i||si — u/2 + w/2\\ 2 such that si € B Al (Fi) and 
s 2 = u — w — si € B A2 (F 2 ). This is equivalent to minimizing | ||si — s 2 \\ 2 - We have: 


max 

si£B Ai (Fi ) 
s 2 GB A 2(F 2 ) 


Sl~\-S2=U—W 


1 

8 


si-sail! 


min 

Wl ew A i 

w 2 ew A z 


max 

sieR" 

s 2 ei“ 

Si~\-S2=U — W 


- - s 2 ||l + /i(wi) +f 2 (w 2 ) 

-wjsx - 
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(a) (b) (c) 

Figure 8: (a) Total number of inner iterations for varying a. (b) Total number of outer iterations for varying 
a. and (c) Number of inner iterations per each outer iteration for the a = 10 1 . 


min max 

w 2 ew A2 


-||u - w - 2s 2 \\l + /l(wi) + / 2 O 2 ) 
—wj (u — w — S 2 ) — wj s 2 


mi n max ( - *||w - w\\% - *||s 2 ||2 + 0 - w) 

W!£W Al s 2 eR n \ o Z z 

w 2 ew 2 +/ 1 (w 1 ) + f 2 (w 2 ) - wj{u - w - s 2 ) - wjs 2 


min 

Wl ew A i 

w 2 ew A2 


-wj(u-w) + /i(wi) +f 2 (w 2 ) - ^||u- w\\l 
+ max x ||5 2 1 |2 + sj + W! - w 2 ) 


min 

Knew- 41 

W2 ew A2 


min 

Knew- 41 

w 2 ew A2 


wj(u-w) + fl(w!) + f 2 (lV 2 ) - ^I|u - w\\l 

+ \\\^+Vh-™2\\ 2 ^ 

wj(u -w) + fl{wi) + f 2 {w 2 ) + i||wi - w 2 ||| 
+ *(« - ru) T (wi - w 2 ) j 


min 

Knew - 41 

i» 2 ew Al 


fi(wi) + f 2 {w 2 ) - -{u 


+^\m -W2W2 


w) T (w 1 + w 2 ) 


Thus Si and S 2 are dual to certain vectors w 1 and w 2 , which minimize a decoupled formulation in /1 and f 2 . 
To check optimality, like in the single function case, it decouples over the constant sets of w 1 and w 2 , which 
is exactly what step (c) is performing. 

If the check is satisfied, it means that W\ and w 2 are in fact optimal for the problem above without the 
restriction in compatibilities, which implies that they are the Dykstra solutions for the TV problem. 

If the check is not satisfied, then the same reasoning as for the one function case, leads directions of descent 
for the new primal problem above. Hence it decreases; since its value is equal to — | ||si — S2||i> the value of 
ll s i — S 2 II 2 must increase, hence the second statement. 
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C Decoupled problems 


Given that we deal with polytopes, knowing w implies that we know the faces on which we have to looked 
for. It turns outs that for base polytopes, these faces are products of base polytopes for modified functions (a 
similar fact holds for their outer approximations). 

Given the ordered partition A! defined by the level sets of w (which have to be finer than Ai and A2). we 
know that we may restrict (F :l ) to elements s such that s(B) = F(B) for all sup-level sets B of w 
(which have to be unions of contiguous elements of Aj)\ see an illustration below. 


A j, 1 Aj 2 

I— I - —h 


A 


j, 3 


h 


+ 


c 1 


C 2 



] 


More precisely, if Ci,..., C m > are constant sets of w ordered with decreasing values. Then, we may search 
for Sj independently for each subvector ( Sj)c k £ M. Ck , k £ {1,..., m'} and with the constraint that 

( s o)c k £ B AjnCk [(F j ) Ck \c 1 u-uc k _ 1 ], 

where AjfiCk is the ordered partition obtained from Aj once restricted onto Ci, and the submodular function 
is the so-called contraction of F on Ck given C\ U • • • U Ck-i, dehned as S' 1 —> Fj(S U C\ U • • • U Ck- 1 ) — 
F{C\ U • • • U Ck- 1). Thus this corresponds to solving m different smaller subproblems. 


D Choice of a 


The Dykstra step, i.e., step (b) of the algorithm proposed in Section 4.4.1 is not finitely convergent. Therefore. 


it needs to be solved approximately. For this purpose, we introduce a parameter a to approximately solve 
the Dykstra step such that ||si + S 2 — u + u>||i < a(ei + e 2 ). Let e be defined as a{e\ + e 2 ). This shows 
that the si and S 2 are e-accurate. Therefore, a must be chosen in such a way that we avoid cycling in our 
algorithm. However, another alternative is to warm start the dykstra step with w\ and W 2 of the previous 
iteration. This ensures we dont go back to the same w\ and u> 2 , which we have already encountered and 
avoid cycling. Figure [8] shows the performance of our algorithm for a simple problem of 100 x 100 2D-grid 
with 4-neighborhood and uniform weights on the edges with varying a. Figure[8]-(a) shows the total number 
of inner iterations required to solve the TV problem. Figure[8]-(b) gives the total number of SFM oracle calls 
required to solve the TV problem. In Figure [8]-(c), we show the number of inner iterations in every outer 
iteration for the best a we have encountered. 
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